Deconstructing the law of effect
نویسنده
چکیده
Do the consequences of past behavior alter future policy, as the law of effect assumes? Or, are behavioral policies based on behaviorally produced information about the state of the world, but not themselves subject to change? In the first case, stable policies are equilibria discovered by trial and error, so adjustments to abrupt changes in the environment must proceed slowly. In the second, adjustments can be as abrupt as the environmental changes. Matching behavior is the robust tendency of subjects to match the relative time and effort they invest in different foraging options to the relative incomes derived from them. Measurement of the time course of adjustments to step changes in the reward-scheduling environment show that adjustments can be as abrupt as the changes that drives them, and can occur with the minimum possible latency. Broader implications for theories about the role of experience in behavior are discussed. Economists and psychologists commonly assume that behavior is shaped by its consequences. Psychologists call this the law of effect, by which they understand that we and other animals try different behaviors, assess their effects, and do more of those with better effects and less with those with worse. On this view, the behaviorally important consequence of a behavior is the information it provides about behavioral outcomes. The effect of the information is to alter policy. There is, however, a different way in which the consequences of behavior may shape future behavior. Some policies depend for their execution on information about the state of the world. Changing the information fed to a policy changes the behavior it generates. Because behavior generates information about the state of the world, the effects of past behavior may alter future behavior by changing the model of the world that a fixed policy takes as input. Gallistel: Deconstructing law of effect Page 2 Attempts to distinguish between these possibilities are rare. Three collaborators and I (Gallistel, Mark et al. 2001) have recently distinguished between them in the studying the genesis of matching behavior, which is the robust tendency of animals, human and otherwise, to prorate their behavioral investments in different foraging options so that the investment proportions match the income proportions (Herrnstein 1961; Harper 1982; Godin and Keenleyside 1984; Davison and McCarthy 1988; Herrnstein 1991). If the subject gets 2/3 of its income from foraging in one location and 1/3 from foraging in another, then it spends 2/3 of its foraging time in the first and 1/3 in the other. In the laboratory, matching behavior is most commonly studied using what is called free operant behavior with concurrent variable interval schedules of reinforcement. In this paradigm, subjects have two different reward-generating response options. Typically, if they are pigeons, the options are two different keys, either of which may be pecked; if they are rats, the options are two different levers, either of which may be pressed. The rewards for pecking or pressing are typically small amounts of food. The behaviors are called free operants because they operate on the environment to produce reward and because the subject's opportunity to engage in them is not constrained. Subjects can make either response whenever they like1. A schedule of reinforcement is the experimenter-determined function relating investment to reward. In a variable interval schedule, the next reward delivered by a response on one of the two options is set up at a randomly varied interval after the harvesting of the last reward from that option. Once set up, the reward remains available until it is harvested by the next response. The expected interval to the next set-up distinguishes one variable interval schedule from another. The schedules for the two response options are concurrent when they run in parallel, with the timers for both schedules running regardless of which option the subject is exercising at the moment. The fact that the reward-scheduling function is called a schedule of reinforcement suggests the extent to which the law of effect is taken for granted by psychologists. It is assumed that 1 , For matching to occur, a minimal amount of time (on the order of a second or two) must be lost in shifting between the options. Otherwise, subjects can in effect exercise both options at once (play both machines simultaneously), which is what they do. Gallistel: Deconstructing law of effect Page 3 rewards act to strengthen rewarded behaviors, that is, to make them relatively more likely to occur. In a purely descriptive sense, of course, they do: the shorter the expected set-up interval for one schedule is, relative to the other, the more likely it is that at any given moment the subject will be investing in that option. It does not follow, however, that the schedule of reinforcement affects the subject's behavior by way of an effect on the subject's policy. Subjects may have a fixed policy for translating relative expected incomes into relative investments. In that case, what they get from responding is not policy guidance but rather an estimate of the income to be expected. The income from an option is the amount of reward it yields per unit of time tout court—not per unit of time invested. Provided that the subject samples an option at intervals that are on average shorter than the expected interval to the next reward, the income from a variable interval schedule is only weakly affected by the size of the subject's investment. In short, to experience the income from an option a subject must spend some time exercising that option, but, within broad limits, the amount of time it spends has little effect on the income it yields. The subject's behavior reveals, so to speak, the income that may be obtained from a given option. There is some reason to think that matching might be an innate policy, because both human and pigeon subjects pursue it even under circumstances where it is the worst policy, the policy that minimizes their overall return (Herrnstein 1991). At the very least, this implies that there are limits to the ability of response consequences to shape policy. Two Contrasting Accounts One of the attractions of addressing the issue of the role of past behavioral consequences in the determination of future behavior by considering the matching phenomenon is that it permits a clear formulation of the alternative accounts. The first account involves what Herrnstein called melioration, which is "the process of comparing the rates of return and shifting toward the alternative that is currently yielding the better return" (Herrnstein and Prelec 1991, p. 361). Some version of this idea has been the basis for most attempts to explain matching behavior, although none of these attempts has succeeded in specifying the details in such a way as to Gallistel: Deconstructing law of effect Page 4 yield a model that captures the details of the behavior (Lea and Dow 1984; Herrnstein and Prelec 1991). A particularly vexing problem has been the specification of the interval over which subjects average when estimating their returns. The wider this averaging window, the more slowly subjects will approach the new stable equilibrium when the relative richness of the schedules changes. Melioration models have never been able to specify an empirically defensible averaging window (Lea and Dow 1984). What melioration models have in common is the assumption that matching is not itself the policy. The policy is whatever melioration leads to. Matching is what melioration leads to when there are variable interval schedules of reinforcement, because in that environment, matching equates returns. Because subjects sample both options at intervals shorter than the expected intervals between rewards, the income-limiting factor is the expected set-up interval of the schedule. Increasing or decreasing the expected duration of a visit—hence, the average investment in an option—has little effect on the income realized from it. Return is income divided by investment. Therefore, increasing the investment in the richer option and decreasing the investment in the poorer decreases the return from the richer and increases the return from the poorer. When the investment ratio matches the income ratio, the returns are equal. Matching is the dynamic equilibrium point, the point at which the consequences of behavior (the returns) do not favor a shift toward either option. Any drift away from this point, produces a countervailing inequality in returns, which drives the behavior back toward matching. The alternative account assumes that matching is the policy and that it is an immutable policy (Gallistel and Gibbon 2000; Gallistel, Mark et al. 2001). In accord with the experimental findings on the microstructure of matching behavior (Heyman 1979; Gibbon 1995), this model assumes that visits to the options are terminated by a Poisson (random rate) process. When a visit has begun, subjects, in effect, repeatedly flip a biased coin to decide when to leave (that is, to temporarily stop exercising the option). When the coin comes up heads, they leave. This assumption has two consequences: First, the distribution of visit durations should be exponential, which it is (Heyman 1979; Gibbon 1995). This is odd if one believes that the function relating behavior to its consequences Gallistel: Deconstructing law of effect Page 5 shapes behavior, because the reward for trying an option becomes more certain the longer a subject has neglected it. Thus, the probability of terminating a visit to try the other option should increase as the visit is prolonged; the longer the subject has been there, the more likely it should be to leave. This, however, is empirically false; the probabilty of terminating a visit does not change as the visit is prolonged, which is why visit durations are exponentially distributed. 2) The expected duration of a visit is determined by the bias on the coin, the rate at which it comes up heads. This parameter of the subject's behavior is assumed to be determined by its estimates of the expected incomes from the available options in accord with the following two equations: E(d1) E(d2 ) = ˆ H 1 ˆ H 2 (1)
منابع مشابه
Examining the Effect of Ideology and Idiosyncrasy on Lexical Choices in Translation Studies within the CDA Framework
Using a critical discourse analytic model of translation criticism, the present study attempts to explore the effect of ideology and idiosyncrasy on the lexical choices in translation studies. The study employed a descriptive approach to answer two research questions: Is there any relationship between ideology and idiosyncratic features of translators' lexical choices? And if yes, can it be ana...
متن کاملMoore's Law and Social Theory: Deconstructing and Redefining Technology Industry's Innovation Edict
The importance of technological innovation in defining and shaping our global economy has made it a central research topic over the past decade. The rise of electronics manufacturing technology, specifically the silicon transistor technology, is considered a major factor influencing technological innovation and in turn, affecting the world’s economic and social transformation. The process of te...
متن کاملEffects of Implementing the Rule of Mitigation of Damages in Medical Law
Under theenforcing the law on damage, when patients are exposed to damage and injury when acting or leaving action on health-related factors such as physicians and laboratory officials, patients must take standard action to mitigate the loss or claim. Otherwise they will lose the right to defend themselves. The most important effect of this rule is that the assignment of a claim against a loss ...
متن کاملEffect of Women's Pregnancy Claims on Delayed Retaliation
The purpose of this research is to promote the criminal law of the subject, in view of the purpose of the sacred law to prevent the abandonment of the execution of punishment, with adjudication and jurisprudential justification, to evaluate different perspectives of the jurists regarding delay or delay in the execution of retribution with the claim of pregnancy If the doctor does not approve, t...
متن کاملاثر فعل زیان دیده بر مسئولیت مدنی عامل زیان
One of the issues long discussed in Law of Tort (civil liability) is the effect of act and fault of injured person on Tort action. In Roman law, condition that committed fault, injured person was deprived of compensation, without the type and degree of his intervention in damage occurrence being studied. In ancient Common Law this rule was governed too. But it was gradually modified in Wester...
متن کاملEffect of Ship Fuel Sulfur Reduction Law on Iranian Oil Price (IMO Law)
In recent decades, the increase of pollution from consumption of oil and petroleum products has led to development of many environmental laws. It is important for Iranian policy makers to be informed about the impact of such laws on oil prices, given the dependence of the country’s budget on oil revenues. Under a new International Maritime Organization regulation passed in mid-2016, ships are r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Games and Economic Behavior
دوره 52 شماره
صفحات -
تاریخ انتشار 2005